BUG: pd.MultiIndex.get_loc(np.nan)#28783
Conversation
|
@toobaz care to take a look? |
doc/source/whatsnew/v1.0.0.rst
Outdated
| - Bug in reindexing a :meth:`PeriodIndex` with another type of index that contained a `Period` (:issue:`28323`) (:issue:`28337`) | ||
| - Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`) | ||
| - Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`) | ||
| - Bug in :meth:`MultiIndex.get_loc` can't find ``np.nan`` when input is NA value (:issue:`19132`) |
There was a problem hiding this comment.
Bug in ..... with a null value as input
There was a problem hiding this comment.
do we have a user visible effect of this? e.g. in actually indexing a Series/DataFrame.
There was a problem hiding this comment.
@jreback Well.... .loc can find nan with a null value as input. Also .get_loc can find values which include nan. But only when input is a null value, can't find indexer. So i think working .get_loc with a null value correctly also makes sense.
There was a problem hiding this comment.
sure, but .get_loc is not a very visible user visible function, what I mean is docs .loc[] on something in particular now work when it did not before?
There was a problem hiding this comment.
@jreback I change whatsnew. I try to pass on what i intend more clearly. Okay I understand. I agree that get_loc is not a very visible user visible function. But eventually, get_loc is used for index object not for Series/DataFrame. What i concern is explanation with indexing a Sereis/DataFrame might tarnish its intention.
toobaz
left a comment
There was a problem hiding this comment.
I didn't have time to test with level > 0, sorry if my comments are a bit vague on this point. I think the comment on the nested method holds, though.
| # The label is present in self.levels[level] but unused: | ||
| raise KeyError(key) | ||
| return slice(i, j) | ||
|
|
There was a problem hiding this comment.
Do you really need this nested method? Unless I'm mistaken, you can just put the code = level_index.get_loc(key) in a conditional statement.
There was a problem hiding this comment.
@toobaz there is code = level_index.get_loc(key) after else statement. This code existed before. I just nest code and put it upper place. Because nesting it shows more clearly what code is for.
There was a problem hiding this comment.
there is
code = level_index.get_loc(key)after else statement. This code existed before.
Yes, that's what I was referring to: this PR could have just replaced that with
if not is_list_like(key) and isna(key):
code = -1
else:
code = level_index.get_loc(key)right?
Because nesting it shows more clearly what code is for.
In my (limited) experience, nested functions make code less readable, and harder to debug. If we want to isolate this code, I would prefer with a separate (better documented) function. But maybe it's a matter of taste, @jreback what's your take on this?
doc/source/whatsnew/v1.0.0.rst
Outdated
| - Bug in reindexing a :meth:`PeriodIndex` with another type of index that contained a `Period` (:issue:`28323`) (:issue:`28337`) | ||
| - Fix assignment of column via `.loc` with numpy non-ns datetime type (:issue:`27395`) | ||
| - Bug in :meth:`Float64Index.astype` where ``np.inf`` was not handled properly when casting to an integer dtype (:issue:`28475`) | ||
| - Bug in :meth:`MultiIndex.get_loc` can't find ``np.nan`` with a null value as input (:issue:`19132`) |
There was a problem hiding this comment.
I would mention "partial indexing" and "first level" somewhere - assuming this is really specific to first level.
There was a problem hiding this comment.
@toobaz When 'get_loc' argument is just one, 'get_loc' in 'MultiIndex' searches at first level before i send PR. I think there was an implicit agreement, so i didn't put in whatsnew.
There was a problem hiding this comment.
I would have expected a fix to #19132 to also solve the case mi.get_loc((1.0, np.nan)). This PR doesn't, fair enough, but I would make it slightly more explicit. Just "can't find np.nan in first level" is probably enough.
There was a problem hiding this comment.
@toobaz. I missunderstood issue. I thought just want to fix with a one nan. Okay. It's my fault. I want to fix completely. I try to figure out and fix all.
There was a problem hiding this comment.
OK, as you prefer. I'm not saying the aim of this PR was not good. Then sure, if you are able to provide a more general solution, that's great.
There was a problem hiding this comment.
@toobaz Thank you for your review and pointing out sharply
0ff4f88 to
1db7dd4
Compare
MultiIndex.get_loc can't find nan with a null value as input
MultiIndex.get_loc can find nan when input is NA value(e.g. nan, None)
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diff